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Abstract 

Approximate Bayesian computation (ABC) methods have become increasingly preva¬ 
lent of late, facilitating as they do the analysis of intractable, or challenging, statistical 
problems. With the initial focus being primarily on the practical import of ABC, explo¬ 
ration of its formal statistical properties has begun to attract more attention. The aim 
of this paper is to establish general conditions under which ABC methods are Bayesian 
consistent, in the sense of producing draws that yield a degenerate posterior distribution 
at the true parameter (vector) asymptotically (in the sample size). We derive conditions 
under which arbitrary summary statistics yield consistent inference in the Bayesian sense, 
with these conditions linked to identification of the true parameters. Using simple illustra¬ 
tive examples that have featured in the literature, we demonstrate that identification, and 
hence consistency, is unlikely to be achieved in many cases, and propose a simple diagnostic 
procedure that can indicate the presence of this problem. We also formally explore the 
link between consistency and the use of auxiliary models within ABC, and illustrate the 
subsequent results in the Lotka-Volterra predator-prey model. 
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1 Introduction 


The use of approximate Bayesian computation (ABC) methods in models with intractable like¬ 
lihoods has gained increased momentum over recent years, extending beyond the original appli¬ 
cations in the biological sciences. (See Marin et al., 2011, and Sisson and Fan, 2011, for recent 
reviews.). Whilst ABC evolved initially as a practical tool, attention has begun to shift to the 
investigation of its formal statistical properties, in particular as they relate to the choice of sum¬ 
mary statistics on which the technique typically relies; see for example, Fearnhead and Prangle 
(2012), Gleim and Pigorsch (2013), Marin et al. (2014), Martin et al. (2014) and Martin et al. 
(2014). 

The aim of this paper is to establish general conditions under which summary statistic- 
based ABC methods are Bayesian consistent, in the sense of producing draws that yield a 
degenerate distribution at the true parameter (vector) in the (sample size) limit. This aim is 
much broader than that underlying Martin et al. (2014), in which standard quasi-likelihood 
conditions were invoked to establish the Bayesian consistency of auxiliary model-based versions 
of ABC. In particular, we derive the conditions under which arbitrary summary statistics yield 
consistent inference, with these conditions linked to the identihcation of the true parameters in 
any particular instance. Using simple illustrative examples that have featured in the literature, 
we demonstrate that consistency is not achieved in many cases. This finding calls into doubt 
routine applications of the ABC method that are driven primarily by the convenience with 
which simple summary statistics can be computed, without further thought being given to the 
information content of those summaries. 

Consistency by its very nature is more of a “thought experiment” than a practical feature 
of an estimation procedure. Nonetheless, consistency is a usefnl metric with which to gange the 
outpnt of a given statistical procedure. Following Diaconis and Freedman (1986), we argue that 
regardless of Bayesian bearing, that is, whether one is a “Classical” Bayesian (who believes in 
a “trne but unknown parameter which is to be estimated from the data”) or a “Subjective” 
Bayesian (who does not believe in true models but, rather, thinks in terms of predictive dis- 
tribntions) consistency is important for verihcation and practical implementation of Bayesian 
procedures. That is, whilst consistency is a property that sits naturally within the Classical 
Bayesian paradigm, it can also be viewed as being important to Subjectivists. To wit, Blackwell 
and Dnbins (1962) and Diaconis and Freedman (1986) argue that consistency can be viewed 
as a “merging of intersubjective opinions” and that consistency of the posterior implies that 
two separate subjective Bayesians with different prior beliefs will nltimately end up with similar 
predictive distributions. 

In what follows, we only concern ourselves with the idea of consistency as it pertains to some 
trne model that is known np to an unknown vector of parameters. In this setting Bayesian 
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consistency means that any Bayesian method should yield increasingly accurate posterior infer¬ 
ence as the sample size increases. While the theory of Bayesian consistency for likelihood-based 
Bayesian methods is now well documented, at least in the hnite-dimensional parameter case, a 
thorough study on Bayesian consistency of so-called likelihood free methods, such as ABC, has 
yet to be undertaken. This represents an important gap in the literature, and is one we look to 

an. 

Bayesian consistency for posteriors based on hnite-dimensional parameters is often derived 
under boundedness conditions for the underlying density function of the true model; see, for 
example, Le Cam (1953), Ibragimov and Has’minskii (1981), and Ghosal et ah (1995). In the 
ABC setting however, conditions based on the underlying density function are not useful since 
by the nature of the very problems to which ABC is applied, the underlying density is typically 
unknown in closed form. To this end, we derive a set of conditions on the summary statistics 
chosen within the ABC procedure that, when satished, ensure consistency of the posterior ob¬ 
tained from ABC. These conditions are similar in spirit to those seen in the literature on indirect 
inference (Gourieroux, et ai, 1993). Examples from the ABG literature are used to demonstrate 
how the aforementioned conditions can be verged in practice. 

The paper proceeds as follows. In Section we briefly outline the basic principles of ABG. 
In Section we establish conditions under which ABG will be consistent for the unknown 
parameters, and simple examples that respectively do and do not satisfy these conditions are 
given. In Section we then propose a practical technique for identifying, in any particular 
problem, when the conditions for consistency are (or are not) satished. The analysis in Sections 
1^ and 1^ focuses on the typical application of ABG, whereby summary statistics are chosen that 
are deemed to contain some information about the parameters of the true model and, more 
often than not, are used to dehne a matching criterion that is a weighted function of sample 
moments. In Section we couch the discussion in terms of a general criterion function, where 
the latter derives from an auxiliary model, and which may - but certainly does not need to - 
derive from the likelihood function of that approximating model. In Section we pursue the 
matter of consistency when using ABG to conduct inference in systems of ordinary differential 
equations (ODEs), with the Lotka-Volterra system for predator and prey used for illustration, 
and demonstrate that a common method for obtaining ABG posterior estimates in this setting 
does not yield Bayesian consistent inference. Section concludes. Proofs of two theorems and 
one corollary are provided in an appendix to the paper. 

2 ABC: an Outline of the Basic Approach 

Suppose we are interested in conducting Bayesian inference on a complex parametric model 
indexed by the unknown p-dimensional parameter 0 G ©, © C compact, and let Pe denote 
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the family of probability measures induced by the model. Assume Pq admits a corresponding 
conditional density p{-\0) and assume we have T observations on the stochastic process yt, 
characterized by p(y\0), with y = {yi,y 2 , ■■■^yT)' denoting the T-dimensional vector of observed 
data. The aim of ABC is to produce draws from an approximation to the posterior distribution 
of the unknown 6 given observed data y, 

p{e\y) (Xp{y\0)p{0), 

in the case where both the prior, p{0), and the likelihood, p(y|0), can be easily simulated. 
These draws are used, in turn, to approximate posterior quantities of interest, including marginal 
posterior moments, marginal posterior distributions and predictive distributions. The simplest 
(accept/reject) form of the algorithm (Tavare et al., 1997, Pritchard et ai, 1999) is detailed in 
Algorithm [Tj 


Algorithm 1 ABC algorithm 
1: Simulate 6\ i = 1,2,N, from p{0) 

2: Simulate z* = {z{, z^,..., , i = 1,2, ...,N, from the likelihood, p(.|0*) 

3: Select 0* such that: 

d{riiy),r}iz")} <£, ( 1 ) 

where r]{-) is a (vector) statistic, d{-, •} is a distance function (or metric), and the tolerance 
level £ is chosen as small as the computing budget allows. 


Algorithm thus samples 6 and z from the joint posterior: 


Pe{0,z\r]{y)) 


p{O)p{z\0%[z{0)] 

/© J^p{O)p{2\O%HO)]dzd0' 


where Ie[z{0)]:=I[d{r]{y),r]{z{0))} < e] is one if d {r]{y),r]{z{0))} < e and zero elsej^ Clearly, 
when r]{y) is a sufficient statistic and £ is arbitrarily small. 


Pe{0\r]{y)) = J^p,{0,z\r]{y))dz (2) 

approximates the exact posterior, p{0\y), and draws from p£{0, z|?7(y)) can be used to estimate 
features of the true posterior. In practice however, the complexity of the models to which ABC 
is applied implies, almost by dehnition, that sufficiency is unattainable. Hence, in the limit, as 
e —)■ 0, the draws can be used only to approximate features of p{0\'n{y)). 

ABC-based estimates of p{0\y) thus suffer from three types of approximation error: one 
invoked by the use of summary statistics that are not sufficient for 0; another associated with 
the use in practice of a non-zero tolerance, e, for selecting draws from p{0\T]{y))-, and, thirdly, 
the error produced when using non-parametric density techniques to estimate p{0\T]{y)) from a 

^The notation z{6) is used to emphasize the dependence of the simulated z on 9. 
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given set of selected draws. For any level of overall computational burden (i.e., the total number 
of draws A^), reducing e comes at a cost of reducing the probability of a draw being accepted, 
thereby contributing to the third form of error. The problem is exacerbated the larger is the 
dimension of r/(y); see Blum (2010), Blum et al. (2013) and Nott et al. (2014). In practice 
£ tends to be chosen such that, for a given value of N, a certain (small) proportion of draws 
of 6^ are selected, with attempts then made to reduce the third form of error using a variety 
of post-sampling (kernel-based) corrections of the draws (Beaumont et al, 2002, Blum, 2010, 
Blum and Frangois, 2010). Other work gives emphasis to choosing r]{-) and/or the selection 
mechanism itself in such a way that p{0\'n{y)) is a closer match to p{6\y), in some sense. This 
may involve the replacement of the basic accept/reject scheme with Markov chain Monte Carlo 
(MCMC) and/or sequential Monte Carlo (SMC) steps (Marjoram et al, 2003, Sisson et al, 
2007, Beaumont et al., 2009, Toni et al., 2009 and Wegmann et al, 2009); or the selection of a 
vector r]{-) that is more informative in some well-dehned sense; see Joyce and Marjoram (2008), 
Wegmann et al. (2009), Blum (2010) and Fearnhead and Prangle (2012). 

In this latter spirit - and mimicking the frequentist techniques of indirect inference (II) 
(Gourieroux et al., 1993, Heggland and Frigessi, 2004) and efficient method of moments (EMM) 
(Gallant and Tauchen, 1996), Drovandi et al. (2011), Gleim and Pigorsch (2013), Martin et al. 
(2014), Drovandi et al. (2015) and Creel and Kristensen (2015) exploit an approximating model 
to produce the summary statistic vector r/(-). Under certain conditions on the auxiliary model, 
asymptotic sufficiency (at least) is attainable via use of the maximum likelihood estimates of 
the auxiliary parameters as the matching statistics in the ABC algorithm. Martin et al. also 
prove (for £ —)■ 0) the (Bayesian) consistency of the ABC approach that uses the MLE of the 
parameters of the auxiliary model to dehne »7(-), under similar conditions to those used to 
prove the consistency of the II method. The authors demonstrate the equivalence (again, as the 
tolerance approaches zero) of inference based on the score of the auxiliary model to that based 
on the MLE. This equivalence holds for any sample size and, hence, ensures that consistency 
is maintained by the (computationally efficient) score-based approach on the satisfaction of the 
appropriate conditions. 

In this paper we also address the issue of Bayesian consistency, but in the completely general 
setting in which ? 7 (-) comprises an arbitrary vector summary statistic, with elements possibly 
including, but not limited to, sample moments of the data, and with ? 7 (-) not necessarily having 
an explicit link to the parameters of an auxiliary model. In the particular situation where rf^-) 
forms a vector statistic composed of sample moments, ABC parallels the frequentist method of 
simulated moments (McFadden, 1989, Fakes and Pollard, 1989, Duffie and Singleton, 1993). In 
the following section we maintain full generality in terms of the dehnition of t]{-). In Section]^ 
we then consider the case where the matching criterion is explicitly dehned with respect to an 
auxiliary model, highlighting the fact that the likelihood function of that model is by no means 
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the only possible criterion that can be adopted. 


3 ABC and Consistency 

3.1 Consistency and Summary Statistics 

Herein, we will only concern ourselves with the Classical ideal of Bayesian consistency: namely, 
as more data accumulates the posterior should stabilize around some true value and eventually 
collapse to a point mass at the same true value. More formally, for some set H C 0, dehne the 
posterior probability of A as 

Pr(0 G H|y) = J p(0jy)d0, 

A 

we then have the following well-known dehnition; 

Definition 1: For true value 0 = 0°, the posterior density p{6\y) is Bayesian consistent if for 
any 5 > 0 and ^ 5 ( 0 °) an open neighborhood of 0°, Pr(0 ^ M:^(0°)|y) 0 as T —)■ cx). 

p 

Herein, the symbol —)■ denotes convergence in probability, and the symbols op^ot), Op(6t), phmy_,.g^, 
to be used below, have the usual dehnition. 

Unlike the notion of consistency dehned above, Bayesian consistency of posterior densities 
obtained from ABC requires not only T — )■ cx) but e — )■ 0 and is particular to the choice of r/(-) 
(and, indeed dj-,-})- Given this fact, we require a separate dehnition of Bayesian consistency 
for ABC. 

Definition 2: For true value 6 = 0^ and (vector) summary statistics MA —)• B, where 
B C and d > p, the ABC-based posterior density Pe(0|?7(y)) is Bayesian consistent if 
for any d > 0, Pr£(0 ^ ^si0^)\'n{y)) A 0 as T —)■ cx and £ —)■ 0. 

In addition, and in common with the standard dehnition (Defn. 1), the prior density used in 
ABC must be positive at the true value 0^ and so we will assume the following condition is 
satished. 

Assumption [P]: The prior density p(-) is continuous and p{0^) > 0. 

As a heuristic for what Bayesian consistency in the ABC setting entails, consider the following 
simple example. Assume 0 := [0,1] and that d G 0 has uniform prior probability. Consider 
some d > 0 , £ > 0 , and assume we have data y that is generated according to true (scalar) value 
9^. For N = 5 simulations the output is contained in Figure For the particular 5 > 0 and 
e > 0 chosen, two points he within ^ 5 ( 6 *°) and three points he outside. Clearly, the ABC-based 
posterior density Pei^l'niy)) only places mass on 9i and 9 4 , as these points lead to a distance less 
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than e, and zero mass is placed on the remaining three points. The ABC-based posterior density 
PeiOliliy)) will be Bayesian consistent if for any arbitrary 5 > 0 and some e > 0 similar behavior 
to that observed in Fignre[^holds as T —)■ oo. This reqnires the following to be satished: one, for 
any given <5 > 0, we must be able to simulate draws within ^ 5 ( 0 °) (guaranteed by Assumption 
[P]); two, for any T, including large T, there must exist a value of e such that the only draws 
satisfying d{ri{y),ri{z)} < e are those in ^ 5 ( 6 *°); three, for the value of e in two, there must exist 
a corresponding number of simulation draws N{e) such that at least one simulated 6 ** G 
satisfying d{r]{y), r/(z(6**))} < e occurs, else will not exist. The formalization of these 

d{viy),v{z)} 

A 

02 


05 


03 


e 

[■^4 ^ '1 


01 


9° -S 0^ 9° + S 


© 


Figure 1: An illustration of ABC output for N = 5 simulations and given values of 9^, 5, e, and 

^(y)- 

statements, along with the precise set of assumptions that a vector of summary statistics, r/(y), 
should satisfy in order for ABC to yield consistent inference, is the content of Theorem 1 and 
its proof. Subsequent to the presentation of the theorem, we provide a simple example in which 
the conditions are satished, followed by a second example in which they are not. The way in 
which an increase in the dimension of T]{y) can be used to retrieve consistency in the latter case, 
is then illustrated. 

Dehne the limiting value of the summary statistic based on observed data (respectively, 
simulated data) as b(0°) (respectively, b(0*)), and let || ■ || denote the Euclidean norm. 

Theorem 1 Let d{-,-} be an induced metric on the normed space (B, H-H^). Given summary 
statistics r]{y), assume that the following conditions are satisfied: 

[50] The DGP for y is uniquely defined at 6^. 

[51] ||r;(y)-b(Oll=op(l). 

[52] The map 0* h->• b(0*) is deterministic, continuous, and satisfies 
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1S2(1)] supa^a MAS)) - b(6l)|| = Op(l). 
[S2(2)l b(.) is one-to-one in 6\ 


If [S0]-[S2] above and [P] are satisfied, then, for any 5 > 0, Pr£{6 ^ 'Rs{d^)\'n{y)) A 0 as T —)■ cx) 


and e —>■ 0. 


Remark 1 Theorem states that, for ABC based on r}{-) to be consistent the limit map 
^ b(6>*) must act in the same manner as the “binding function” in indirect inference; 
see Gourieroux et al. (1993) and Gourieroux and Monfort (1996) for a general discussion 


of binding functions. 

Remark 2 Since we are only concerned with Glassical Bayesian consistency, Assumption [SO] 
is implicit and therefore not explicitly required. However, Assumption [SO] is a deep 
identihcation condition that may not be satisfied in all circumstances and is therefore 
maintained to illustrate the scope of the models to which this result will apply. Assumption 
[SI] is often satisfied under general conditions restricting the dependence in the observed 
data. Assumption [S2(l)] requires that for all ^ > 0, 



and is generally referred to as uniform convergence. This stronger notion of convergence 
is required to ensure that the simulated paths z(0), and the subsequent r]{z{0)), are well- 
behaved over ©. General conditions determining satisfaction of [S2(l)] are now well-known 
and a great many results can be obtained from the empirical process literature; see, for 
instance, Pollard (1990). In particular, [S2(l)] is likely to be satisfied for many different 
types of summary statistics so long as the prior density p{-) admits values of 0 that do 
not allow the simulated data to display too much persistence]^ 

Remark 3 Theorem requires that the (vector of) summary statistics based on observed data 
converges, with respect to (i{-, ■}, to a hxed quantity and the corresponding vector of 
statistics based on simulated data z* = z(0*) converges (uniformly), with respect to d{-, ■}, 
to a deterministic function of Gonsistency thus depends not only on the choice of r/(y) 
but also on the precise choice of (i{-, •}, with convergence in one metric not necessarily 
implying convergence in another. However, restricting d : B x B M+ to be an induced 
metric on the normed space (B, H-H^) - i.e., for r]i,r ]2 G B, requiring that d{r]i,r] 2 } = 
II 77i— 77211 ^ for some norm H-H^ - relieves the convergence issue since all norms on B are 
equivalent to the Euclidean norm || ■ ||. The requirement that d{-, ■} be an induced metric 
is not restrictive as the most common choices of d{-, ■} satisfy this condition. 

^Technically, conditions [SO] and [S2(l)] imply condition [SI]. However, the authors believe it is helpful to 

specify separate conditions on the statistics associated with observed and simulated data. 



Remark 4 Bayesian consistency says that for any <5 > 0, Pe(0|?7(y)) will attribute zero prob¬ 
ability, as T —)■ cx), to points outside it does not say anything about how well 

Pe{G\'n{y)) approximates the posterior density p{9\y) or even the partial posterior density 
p{d\'n{y))- Specihcally, the demonstration of Bayesian consistency is distinct from existing 
theoretical work on ABC that shows PE{d\'n{y)) is consistent for p{d\T]{y)), as N ^ oo and 
£ —)■ 0, for any 0 G © and for any hxed T. To prove the latter form of result, researchers 
have borrowed from the literature on nonparametric density estimation and relied on the 
idea of mean squared error (MSE) consistency, which requires the bias and variance of 
Pe{9\r]{y))— p{0\r]{y)) to approach zero as —>■ cxd and £ —>■ 0; see, for example, Blum 

(2010) and Biau et al. (2015). In particular, MSE consistency requires a specihc rate 
condition between N and £ to ensure that the variance of Pe(^|?7(y))— p(0|^(y)) shrinks 
to zero fast enough. As noted in the proof of Theorem [T| Bayesian consistency still requires 
N to increase as e —)■ 0, but only to ensure that Pe(0|?7(y)) exists for small e, and any 
T. In this way, the particular relationship between N and e is independent of the sample 
size T. This lack of any T-dependent condition for e: contrasts with the need for such a 
condition when deriving results for the asymptotic distribution of ABC point estimators; 
see, for example, Li and Fearnhead (2015). We elaborate further on this distinction in an 
on-line supplementary appendix to the paper 

3.2 Success and Failure of Summary Statistic-based ABC 

Consistency of ABC based on r]{-) hinges on the particular form of b(0*). If b(-) is one-to-one, 
i.e., the map 0* h-)■ b(0*) satishes [S2(2)], and the remaining assumptions in Theorem hold, 
ABC based on r]{-) will be consistent. There is generally no guarantee that b(-) will be one- 
to-one and satisfaction of this condition depends on both the true structural model and the 
particular choice of summary statistics. Examples 1 and 2 illustrate a case where [S2(2)] is and 
is not satished, respectively. Example 3 illustrates the impact on identihcation and, hence, the 
attainment of consistency, of adding summary statistics to an initial set. 

Example 1 (Satisfaction of S2(2) ) Consider the following autoregressive (AR) model of or¬ 
der one: 

yt = Oyt-i z/t, 

where Vt ~hhd.A^(0,l) and l^l < 1,9 0. Whilst the likelihood for this model is known in 

closed form and, hence, exact Bayesian inference is perfectly feasible, for the sake of illustration, 
consider Algorithm^ based on the summary statistic r]{y) = ^ Y^J= 2 ytyt-i- 

Assume that some true value 9^ has generated the observed sample y. For p^ (6'|?7(y)) to 
be degenerate at 9^ it must be that d{b{9^), b{9'^)} = 0 has a unigue solution 0* = 9^. By the 

^This document is available at: littp://users.monasli.edu.au/~ 2 ;martin/FMR_Supplementary_Appendix.pdf). 
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Algorithm 2 ABC algorithm: AR( 1 ) example 
1 : Simulate the AR(1) coefficient 0^ from, for instance, a uniform prior over (—1,1); 
2 : Generate an i.i.d. sequence 
3: Produce a simulated series 

4: Accept the value simulated in Step (1) if d{ri{y), ri{z^)} < £ for e: > 0 and small. 


weak law of large numbers r]{y) A E[ytyt_i] = b{6^) = 6^/{I — {6^)^) and r]{z‘^) A E[zlzl_i] = 
b{E) = E/{1 - {Ef), so d{b{e^),b{E)} = 0 requires that 

0 = 6(00) _ 6(0*) = (0*)20O + 0*(i _ (00)2) _ go 

has unique solution 0* = 0°. This quadratic equation in 0* has two solutions: 0* = 0° and 
0* = —1/0°. However, given that |0°| <1, 0° 7 ^ 0, the second solution is not in the feasible 
region for 0* and so ABC based on yfy) = ^ satisfies the conditions of Theorem ^ 

Example 2 (Failure of S2(2)) Consider now the moving average (MA) model of order two: 


Vt — E 9iet-i + 02et_2, 

where Ct ~ i.i.d.Nif), 1 ) and 0 i, 62 satisfy the following invertibility conditions 


( 3 ) 


- 2 < 01 < 2 , 01 + 02 > - 1 , 01 - 02 < 1 . 


( 4 ) 


Following Marin et al. (2011), we choose as summary statistics the sample autocovariances 
rjjiy) = ^ Yl't=i+j ytVt-j> j = 0 , 1 , 2 ..., A. Consider, initially. Algorithm 3, based on r] (y) = 
( 0 o(y), 0 i(y))'- 


Algorithm 3 ABC algorithm: MA( 2 ) example 

1 : Simulate the MA(2) coefficients 0* from p (0) satisfying (|^, where 0 = ( 0 i, 02 )/ 
2 : Generate an i.i.d. sequence {et}J^i; 

3: Produce a simulated series {zl (0*)}^i; 

4: Accept the value generated in Step (1) if d{r](y), r]{z^)} < e for e > 0 and small. 


Assume that true value 0° = has generated the observed data y. By the weak law 

of large numbers y^iy) ^ E[yl] = 1 + (0?)^ + ( 0 °)^ and y^{y) 4 E[ytyt-i] = 9%1 + 0^). In 
addition, conditional on 0 * = ( 6 ^ 1 , 02 )' satisfying equation 0 , 4 1 + (e\r- + (e\f and 

Pi(z*) —)■ 9\{1 + 02 ). For ps ( 0 | 77 (y)) obtained from the above algorithm to be degenerate at 0 it 
must be that for all 0, 0 = b(0°) — b(0*) has unique solution 0* = 0°. Clearly, 


0 = b( 0 O) - b ( 0 *) 


(I + ( 0 ?)^ + ( 0°)4 

V e\{l + 9l) ) 


(1 + (9ir + ( 0 * 2)4 
V 01(1 + 02) J- 
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As in Marin et al. (2011), take = -6, 6^2 = .2. Then the question becomes, does there exist 
0\ 7^ .6, 6*2 7^ .2 such that 


0 = b(0°) - b (0*) 


A + (.6)2 + (.2n 

V -6(1+ .2) J 


fi + ie\f + mx 
V ^1(1+ ^ 2 ) ;■ 


(5) 


Simple numerical calculations reveal that ([^ has two solutions: 9\ = .6,6*2 = .2 and 9\ Ri 
.5453,6*2 ~ .3204, where the latter solution remains in the feasible region for 0* = {9\,9lf)'. 
Therefore, b(-) is not one-to-one and the ABC-based posterior will not converge to 0° = (.6, .2)'. 


Example 3 (Effect of Additional Statistics) Consider the same MA(2) model as in Exam¬ 
ple 2, but now consider the use of the three-dimensional vector of summary statistics: 

0(y) = ivoiy),Vi(j),V2iy)y- 


In the language of the generalized method of moments (CMM) literature, the summary statistics 
of which r/(y) is comprised “over-identify” 0°. In this case, [S2(2)] will be satisfied if the following 
equation has a unique solution for all 0*,0° G 0.' 


0 = b(0°) -b(0*) 


1 + ( 00)2 + ( 00)2 

qO 


1 + (0*^)2 + (0* )2 
9\{l + 9i,) 

91 


The additional (linear) restriction, 0 = 6*° — 6*2, ensures that the only value that satisfies 0 = 
b(0°) - b(0*) is now 0° = (0? , 6*2)', and consistency will be achieved as a consequence. 

Simply adding summary statistics to the ABC procedure is, however, not guaranteed to yield 
consistent inference: the chosen summary statistics must be informative about the underlying 
parameters 0 governing the statistical properties of the structural model. To illustrate this point, 
consider again the above example, but with the three-dimensional vector summary statistic: 


0(y) = (%(y),^i(y),%(y))', 

where ^(y) = 7^ Given the nature of the structural model, ^(y) —)■ E[ytyt- 3 \ = 0 

and by construction 03(z*) —)■ E[zlzl_f\ = 0 for all 0*. Hence, the summary statistic 03(y) yields 
no new information about 0° and does not therefore produce a mapping 0* 1 —> b(0*) that is 
one-to-one. 


We illustrate the theoretical results in Examples 2 and 3 graphically in Figure denoting 
the three relevant vectors of summary statistics as: 

vHy) = (ho(y))hi(y))', 
vHy) = (ho(y).^i(y),h2(y))', 

0^(y) = (ho(y),hi(y),h3(y))'- 
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Using the true parameter vector 0^ = {9^, O^)' = (0.6, 0.2)', a vector of ‘observed’ data, y = 
{yi,y 2 , ■■■,yTy is generated, for T = 100, 200, 500, 1,000 and 5,000. For each given sample of 
size T, p{9\y) is then estimated via the ABC method, using N = 50, 000 simulated draws from 
uniform priors satisfying (|^, and with the tolerance Sj, j = 1,2,3, chosen so that only one- 
percent of the simulated draws are accepted. The top two panels of Figure |^plot Pei(6'i|r;^(y)) 
and Pei(02|^^(y)) respectively, where the notation here indicates the kernel density estimate 
of the relevant marginal density, conditional on r/^(y), and as dehned for the given Si. As the 
sample size increases both estimated marginals become more concentrated, but not around the 
true values of 0.6 and 0.2. In contrast, the plots in the two middle panels demonstrate the 
consistency that obtains when conditioning on r/^(y), a result that is not replicated in the two 
bottom panels, in which the three-dimensional conditioning vector is (y) 


ABC marginal posteriors: MA(2) model 






Figure 2: ABC-based estimates of the marginal posterior densities for the parameters of 
the MA(2) model, 6 i and 62 , with varying sample sizes. Top two panels: summary 
statistic vector of 77^ (y) = ( 77 o(y))hi(y))^ Middle two panels: summary statistic vector of 
77^ (y) = (77o(y))hi(y))h2(y))^j Bottom two panels: summary statistic vector of 77^ (y) = 
(ho(y))hi(y))h3(y))^- The true parameter values are 9^ = 0.6 and 6*2 = 0.2. 


Remark 5 The above example illustrates that adding additional summary statistics to an ABC 
procedure may or may not aid researchers in obtaining consistent inference. In particular, 

^Whilst we have not pursued this in any formal way, the indications are that in the two cases in which identi¬ 
fication (of the true parameters) does not obtain, the marginal posteriors are some form of mixture distribution, 
each with a mode (or modes) that reflects (reflect) the location of the two pairs of parameter values that satisfy 

0 . 


12 


































































adding summary statistics will only be helpful if the additional statistics contain infor¬ 
mation about the parameters that is not accounted for by the summary statistics already 
used in the analysis. Therefore, arbitrarily adding summary statistics will not necessarily 
yield valid inference. Moreover, and as was noted in Section given that adding summary 
statistics hampers our ability to accurately estimate the associated conditional density, 
adding summary statistics to any initial ABC procedure should be embarked upon with 
care. 

Remark 6 It is also important to note that no link is to be expected between the particular 
model at hand and the likelihood of Assumption [S2(2)] being satished. As the above 
examples illustrate, it is the combination of the model structure and the choice of summary 
statistics that determines Bayesian consistency via ABC. 

4 Detecting Consistency 

4.1 Preliminaries 

Beyond understanding the theoretical conditions that must hold in order for a particular set of 
summary statistics to yield valid inference, and noting that in complex settings verifying the 
conditions of Theorem 1 will typically not be possible via analytical means, it is useful to have 
some way of ascertaining numerically whether those conditions actually hold in any given case. 
To this end, we present a diagnostic tool that can be used to determine if the estimated posterior 
obtained using a specific set of summary statistics, say r/(y), is Bayesian consistent. 

The key insight to understanding the diagnostic procedure is that if the true value 0^ were 
known, we would only require a local version of the identihcation condition (Assumption [S2(2)]); 
i.e., we would only need to check that there existed no 0 *, with 6* ^ 6^, for which b(0*) = b(0°). 
However, because 6^ is unknown, a sufficient condition to ensure that the above holds is that the 
map 6 I— )■ b(0) is one-to-one; i.e., that b(0*^)— b(0*) = 0 yields the unique solution 0* = 0^ for 
each and every possible value of 0^. In this way, detecting Bayesian consistency in ABC reduces 
to detecting satisfaction of the one-to-one mapping assumption. The diagnostic procedure we 
propose seeks to verify this condition, and hence the consistency of ABC posterior estimates, in 
two stages: firstly, as it is applied to the observed data y (Section [4.2[ ), and secondly, in terms of 
its repeated application to data sets artihcially generated from the assumed true data generating 
process and across the feasible parameter space (Section |4.3[ ). 

The verihcation procedure exploits the following two facts: 1) under the conditions of Theo¬ 
rem]^ the possible set of solutions for which d{r}{y), r/(z(0))} = op(l) always includes 0 = 0°; 2) 
if d{ri{y), ri{z{0))} = op(l) uniquely at 0 = 0°, then an ABC procedure based on an augmented 
vector of summary statistics, 7(y) = (^7(y)^ g(y)T) yield a posterior that is Bayesian consis¬ 


ts 






tent, so long as g(-) satisfies conditions [SI] and [S2(l)] of Theoremj^ To state these results more 
formally, assume r/(y) (respectively, g(y)) has a well-dehned limit bi(0°) (respectively, b2(0°)) 
and denote the limit quantity of r7(z*) (respectively, g(z®)) as bi(0*) (respectively, b2(0*)). 

Corollary 1 Given summary statistics 7(y) = (^7(y)^ g(y)T; assume that the following condi¬ 
tions are satisfied: 

[CO] The DGP for y is uniquely defined at 0°. 

[Cl] For b(0°) = (bi(0°)', b2(0°)')', we have ||7(y) - b(0°)|| = op(l). 

[C2] The map 0* h-)■ b(0*) is deterministic, continuous, exists for all 0* G 0, and satisfies 

[C2(l)] supege Il7(z(^)) - b(0)|| = op(l), 

[C2(2)] bi(-) is one-to-one in 0*. 

If [G0]-[G2] and [P] are satisfied, then, for all 5 > 0, Pre{6 ^ M:5(0°)|7(y)) A 0 as T —)■ cx) and 

£ —y 0 . 

4.2 Use of the Observed Data 

To understand the implications of Corollary consider the case where we have already obtained 
P£j(0|?7(y)), for some tolerance £1, with r/(y) based on a value of T that is assumed to be large 
enough for large sample behavior to be in evidence. Now, if we were to run ABC again using 
the joint summary statistic 7(y), Corollary implies that one of two things will happen: either 
the posterior computed for some tolerance £ 2 , will be located in a very similar 

position to pej(0|77(y)), only potentially flatter or with a slightly different shape, a consequence 
of the increased dimensionality]^ or the high mass region of Pe 2 {^\"t{y)) will be located in a 
distinctly different part of the support from that of p£^(0|77(y)). We refer to this latter event 
as one of P£2(^l7(y)) “jumping away” from psj^{0\rj{y)) and, according to Corollary]^ see the 
occurrence of this event as evidence that the initial summary statistics did not yield a posterior 
that is Bayesian consistent. If, on the other hand, the addition of g(y) does not cause the mass 
of Pe 2 (^l 7 (y)) to jump in relation to ( 0 |? 7 (y)), then this suggests that the initial choice of 
summary statistics, r/(y), may have yielded valid inference. The use of the word ‘may’ reflects 
the fact that there is no guarantee possible, via use of the observed data alone, that consistency 
has been achieved, since there is no guarantee a priori that d{r]{y),r]{z{6))} = op(l) has a 
unique solution 0 = 0°. It is this point that is addressed in next subsection. 

® Simulation evidence suggests that the increased flatness (or otherwise) of the subsequent posterior estimates 
depends on the nature of the information about 9^ contained in the additional summary statistics. 
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ABC marginal posteriors: MA(2) model, adding summary statistics 




Figure 3: ABC-based estimates of the marginal posterior densities for the parameters of the 
MA(2) model, Q\ and Q 21 with T = 5000. The key for both panels indicates the summary 
statistic vector used: (y) = (r?o(y),hi(y))'; (y) = (^Hy)', h 2 (y))'; ^^(y) = (^^(y)', h 3 (y))'; 

^^(y) = (^^(y)', ^Er=i2/t)'; ^®(y) = (^^(y)',^Ef=i ?/*)'• The statistics r/2(y) to rf{y) yield 
Bayesian consistency. The true parameter values are 6 ^ = 0.6 and 9^ = 0.2. 

Meanwhile, we illustrate this preliminary diagnostic exercise via the MA(2) model, in which 
case (from Example 3) we have an analytical result that establishes that consistency for the true 
0° = ( 6 ^?, 6 ^ 2 )' = (. 6 , .2)' is achieved via a particular choice of summary statistics. We adopt hve 
different choices of summary statistics for use in the illustration: 

^^y) = (ho(y).hi(y))' 

^^(y) = (^^(y)',h2(y))' 

^^(y) = (^^(y)',%(y))' 

^^(y) = (^^(y)', 

t=i 

^®(y) = (^^(y)', 

t=i 

where rjjiy) = ^ J2j=i+j UtUt-j ■, for j = 0,1, 2,3. We set the sample size to T = 5, 000, consider 
N = 50, 000 simulations and set the tolerance Sj, j = 1,..., 5, so that we retain one-percent of the 
simulated draws for each choice of summary statistics. From our previous theoretical analysis we 
know that r/^(y) will not yield an estimated posterior, Pei(0|?7^(y)), that is Bayesian consistent, 
while the remaining sets will yield posteriors that are Bayesian consistent, due to the inclusion 
of ri 2 {y). Therefore, after adding ?72(y) fo our initial choice of summary statistics, ? 7 ^(y), the 
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estimated posterior (0|?7^(y)) should be centered around the true values, or thereabouts (given 
the still hnite value of T); that is, the main mass of the posterior computed using rfiy) should 
“jump” away from the main mass of the posterior computed using r/^(y). Subsequently, the 
posteriors based on summary statistics rfiy)-, and r/^(y) should not move much, if at all, 

in relation to (^l^^(y)) but may possibly become flatter, and possibly change shape, with each 
additional summary statistic. Figure illustrates these points exactly. The estimated posterior 
P£2(0|77^(y)) is seen to shift substantially in relation to the estimated posterior Peii^l'n^iy))- In 
turn, adding ?73(y) and ^Ylt=iyt ?7^(y) causes minimal change, and certainly no discernible 
change in location. The location of the high mass point is preserved by the subsequent addition 
of ^ however at this point the dimension of the full statistic r;^(y) appears to cut in, 

with the accuracy of the kernel density estimation adversely affected]^ 


ABC marginal posteriors: MA(2) model, adding uninformative statistics 



Figure 4: ABC-based estimates of the marginal posterior densities for the parameters of the 
MA(2) model, 6 i and 62 , with T = 5000. The key for both panels indicates the summary statistic 
vector used: (y) = (r/o(y)>hi(y))'; ^®(y) = (^Hy), h3(y))'; ^^(y) = (^®(y)>^ Ef=i 2 /i)'- All 

three summary statistics do not yield Bayesian consistency. The true parameter values are 
= 0.6 and el = 0.2. 

Let us now consider a similar exercise with summary statistics 77^ (y) = (?7o(y)5 hl(y))^'^^(y) = 
(^Hy)>h3(y))',^^(y) = (^®(y)>^Ef=i 2 /t)' and r/®(y) = (r/^(y),r72(y))', where we deliberately 
use different notation to distinguish these statistics from those used in the illustration above. 
In this particular setup the only set of summary statistics that will yield consistent inference 
is 77 ® (y), and the aim of the exercise is to illustrate the differential impact of adding non- 

®Results for T = 1, 000 and T = 10, 000 were also considered. The resulting plots paint a similar picture and 
hence have not been included for brevity. 
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ABC marginal posteriors: MA(2) model, adding informative statistics 




Figure 5: ABC-based estimates of the marginal posterior densities for the parameters of the 
MA(2) model, Q\ and Q^i with T = 5000. The key for both panels indicates the summary statistic 
vector used: (y) = ( r]Q{y), 7]^{y) )'; r/®(y) = (r/^y), %(y))'; ^^(y) = (^®(y), ^ Et=i 2/t)'; 

= (^'^(y))h2(y))^- The £rst three summary statistics do not yield Bayesian consistency. 
The fourth is associated with consistency and the marginal posterior estimates are shown to 
differ markedly from the hrst three as a consequence. The true parameter values are 6 i = 0.6 
and 6^2 = 0.2. 

informative and informative summary statistics to an initial set that does not yield identih- 
cation. First, consider Figure]^ which plots posteriors based only on ?7^(y), ?7^(y) and T/^(y). 
Adding rj^iy) to r/^(y) to produce (which we know does not yield identihcation) causes the 

estimated posterior P£6(^l^^(y)) flatten out compared to Pei(0|»7^(y)), and to shift slightly. 
Now, adding the statistic ^ Vt V^{y)i knowing as we do that this statistic will also not 
aid in identihcatioiQ the posterior Pe 7 {'d\rf (y)) becomes even flatter (reflecting the increased 
dimension) and continues to shift away from the posterior mode of both previously estimated 
posteriors, a clear indication that we did not yet have a valid set of summary statistics on the 
previous rounds. 

In Figure we then superimpose on these three plots the estimated marginal posterior based 
on where we know that the combination of 'ff{y) and r] 2 {y) (which dehnes T/®(y)) contains 

sufficient information for the parameters to now be identihed. The change in the estimated 
posterior relative to the existing three, is marked, with a clear peak observed 

around the true values, = 0.6 and 9^ = 0.2. Subsequent additions of statistics to this set will, 

^For the MA(2) model in ([^, with ej ^ i.i.d.N{0, 1), E{y^) = E{et + OiCt-i + 92et-iY composed of four 
different types of moments: E{e^_f.), E{e^_f.et-j), E{et-k^t-j) and E{et-ket-j^t-i), for / 7 ^ fc 7 ^ j, which are all 
zero for any value oi 9 — (0i,02)O 
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along the lines illustrated in Figure produce posteriors that now remain reasonably hxed at 
the same modal value and that vary only in terms of dispersion, if at allj^ 

The results of this section are summed up in the following remarks: 

Remark 7 Corollary says that if ABC based on summary statistics r/(y) yields consistent 
inference, adding more information, in the form of additional summary statistics, will 
never invalidate the inference. Typically, adding further statistics can ‘dull’ the inference, 
in terms of producing a more dispersed posterior, or a posterior with slightly different 
shape; but it will not shift the mode. Hence, repeated augmentation of an initial choice of 
statistics, whereby the mode of the estimated posterior eventually ‘settles’ at a particular 
location, should instill some conhdence in the mind of the investigator that consistency 
may have been achieved. 

Remark 8 Whilst the impact of adding non-informative statistics to an initially non-informative 
set is likely to be problem-specihc, we speculate that small and continual changes in both 
location and dispersion are indicative that a sufficiently informative set of summary statis¬ 
tics has not yet been located. A more substantial shift at some point, followed by a lack of 
change in the location at least, with the subsequent additions of statistics, is indicative that 
identihcation and, hence, consistency, may have been achieved. As flagged above, however, 
important caveats pertaining to this statement are pursued in the following section. 

Remark 9 The above procedure has a similar flavor to the stepwise search algorithm proposed 
in Joyce and Marjoram (2008). Despite this apparent similarity however, the two pro¬ 
cedures differ in terms of their details, as well as having very different objectives. To 
wit, whilst the approach outlined above is concerned with obtaining a vector of summary 
statistics that yield consistent inference, that of Joyce and Marjoram is concerned with ob¬ 
taining a vector of summary statistics that is as informative as possible (or ‘approximately 
sufficient’ to use their terminology) for any given T. 

4.3 Use of Repeated Simulation 

We have demonstrated how the numerical procedure proposed above can determine with some 
certainty whether or not Pei^l'niy)) ” some choice of r/(y) - is concentrating at 0°, in the 
artihcal scenario in which 0^ is known. In practice of course, the true value 0^ is unknown, and 
the proposed method is not capable of distinguishing between Pe{0\ri{y)) concentrating at 0° 

®The experiments in Section |4.2| are conducted using raw distances (no component scaling). However, the 
results were also conducted using distances scaled by the sample covariance matrix of the summary statistics, and 
with individual elements scaled by their simulated variance. Results based on these alternative scaling measures 
are not qualitatively different from those presented herein. The results are available from the authors upon 
request. 
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and PeiGliliy)) concentrating at some other valne 6* 7^ 6^^, satisfying b(0*) = b(0°). However, 
if the binding fnnction is one-to-one, perverse sitnations snch as the above can be rnled ont. 

For a hxed (vector) snmmary statistic, r/(y), it tnrns ont that verifying whether or not the 
binding fnnction is one-to-one is, in principle, possible. To nnderstand how we can verify this 
condition, hrst recall that b(0) is simply the limit, as T —>■ 00 , of the simnlated snmmary 
statistics r/(z(0)), and note that becanse z(0) is simnlated from the strnctnral model, z{6) is 
no longer restricted to be of the same length as the observed sample y. From these facts, we 
see that onr ability to obtain b(0) is limited only by compntational power and time; i.e., we are 
limited only by our ability to simulate (very) long trajectories for z{6). In addition, the entire 
map 0 I—)■ b(0) can be obtained simply by simulating long trajectories of z(0), forming r]{z{6)), 
and repeating the exercise at every 0 G ©. Therefore, with enough computing power (and time), 
it is theoretically possible to verify whether or not 6 1— >■ b(0) is one-to-one. 

While the above logic demonstrates that it is theoretically possible to verify the one-to-one 
condition, it is not practically possible as this approach (technically) requires simulating an 
inhnite number of inhnite series. However, when the data is stationary and the parameter space 
relatively small, we can approximately check this condition through the following steps: 


Algorithm 4 One-to-one verihcation 


1: Use the empirical procedure in Section 4.2 to identify a (vector) summary statistic of interest, 
hereafter denoted 

2: Choose K* distinct parameter values with which to simulate data from the structural model, 
call them 0°’^, . Choose a large integer T* » T » 0. 

3: Simulate z^ = and form the series {?7(z^)}^^; {?7(z^)}^^ consti¬ 

tutes a discrete approximation to 0 h->• b(0) 

4: Determine whether or not {?7(z*^)}^_^ contains K* unique elements. 


For K* and T* large enough, if r]{-) satishes Step (4), and if the estimated Pe{0\'n{y)), as 
based on the observed data, y, is also collapsing toward some point, one should conclude in 
favor of consistency. Of course, this process leaves much left unspecihed, with the most critical 
issues being how to span the parameter space, how to selected the set of pre-specihed statistics, 
and the order in which they are to be explored, plus the manner in which degeneracy of the 
estimated posterior is tested for as T is allowed to increase. However, providing guidelines for 
and proving the theoretical properties of any such search procedure would require several layers 
of formalization and the introduction of new terms and concepts that would detract from the 
current message of the paper. Hence, at this stage we simply emphasize that a completely 
satisfactory assessment of consistency would appear to require both the use of the observed data 
and repeated application of data from the assumed process; and suggest that the sort of exercise 
we are proposing here, albeit informal, is a sensible one to pursue. 
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5 Criterion Functions based on an Auxiliary Model 

5.1 Consistency of Auxiliary Model-based ABC 


At a minimum, implementation of ABC requires some means of generating summary statis¬ 
tics r/(y) that are “informative” about the unknown parameters of the underlying structural 
model; whereby “informative” it is generally meant that the summary statistics are a useful 
way of characterizing the information contained in the observed data. However, the previous 
sections demonstrate that care must be taken to ensure that the chosen summary statistics yield 
consistent inference. 

An alternative way of obtaining informative summary statistics is through the use of an auxil¬ 
iary model that depends on parameters /3 G B C where dp > p=dim(©), and for which the 
likelihood function of the auxiliary model, denoted by L{y\ f3), is known in closed form. Given a 
simple auxiliary likelihood L(y; j3), a growing literature suggests using summary statistics gener¬ 
ated from L(y; /3); for example, one can choose r/(y) = /3(y), where /3(y) = argmax^gB L{y; f3), 
or r/(y) equivalent to the vector score of L(y; /3) evaluated at /3(y). However, by its very nature 
the auxiliary model, and by proxy the summary statistics derived from L(y; /3), is (are) likely to 
describe only certain salient features of the underlying structural model. In particular, there is 
generally no reason to believe that the auxiliary model should “nest” the true structural model 
in some well-defined sense. Indeed, if it does so then this suggests either that the structural 
model itself is tractable - hence excluding the need for ABC - or that the nesting model is 
highly parameterized, thereby inducing a r]{-) of high dimension and the associated problems 
for accuracy. 

Given then that a typical auxiliary model is capable of representing only certain salient 
features of the DGP, there is nothing particularly special about choosing the auxiliary likelihood 
function to generate summary statistics for use within ABC. Moreover, in many cases a realistic 
auxiliary model may yield a likelihood function that is itself too complicated for ABC, from 
a purely computational standpoint, whilst an alternative criterion function, based on the same 
auxiliary model, may yield computationally simpler summary statistics. For example, alternative 
criterion functions - other than an auxiliary likelihood - that could be used inside an ABC 
algorithm include: sums of squared errors, least absolute deviations, and even quadratic functions 
of sample moments (conditional and unconditional) from an auxiliary model, with the latter used 
to dehne an MSM-type of approach, but with moments of the auxiliary rather than the true 
model defining the selection mechanism. 

However, as in the previous section, conditions need to be placed on the relevant criterion 
function to ensure the resultant ABC procedure yields consistent inference. This is the content of 
Theorem 1^ Begin by defining a sample criterion function based on observed data y (respectively, 
simulated data z* = z(0*)) Q(y; (3) (respectively, Q(z®; (3)) and dehne /3(y) (respectively, 
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as the minimizer of Q(y; j3) (respectively, Q(z*; /3)). For a particular choice of Q(-; /3) an ABC 
algorithm could be based on the summary statistics r/(y) = /3(y), ?7(z*) = /3(z*). 

The above intuition yields ABC Algorithmbased on generic criterion Q(.; /3): 

Algorithm 5 ABC algorithm: auxiliary criterion function 

1 : Obtain 3(y) = arg min^gBQ(y; /3), 

2: Simulate 0*, i = 1, 2,A, from p{0) 

3: Simulate z* = i = 1,2,..., N, from the likelihood, p{.\0^) 

4: Select 0* such that: 

d{v{y),'n{z")} = d{3(y),3(z*)} < £, (6) 

where d{-, ■} is a distance function (or metric), and the tolerance level £ is chosen as small 

as the computing budget allows. 

Denote the posterior obtained from the above algorithm as P^(^|^(y))- The following result 
gives conditions under which Pr^(0 ^ 'Rs{d^)\'n{y)) 0 as T —)■ cx) and £ —)■ 0. 

Theorem 2 For an auxiliary model with parameters /3 G B, B C compact, assume that the 
following are satisfied: 

[Gl] There exists a deterministic limit criterion function (5oo(^*;/3) such that 

[Gl(l)] Qoo{0^;(3) is continuous as a function of j3, uniformly in 6\ 

[Gl(2)] sup^gB |Q(y;/3) - Qoo(^°;/3)1 = 0p(l) and supgge.peB IQ(z(^);/3) - Qoc{0]f3)\ = 

Op{l) 

[G2] Qoo(0*;/3) has a unique minimum b(0*) for all 0* G ©; i.e., for all 0* G ©, b(0*) : = 
argmin^gB Qoo(^*;/3) and f3^ := b(0°). 

[G3] b( ■) is one-to-one in 0*; i.e., f3 = b(0*) has a unique solution for all 0* G ©. 

If [G1]-[G3] and [P] are satisfied Prf{0 ^ M:5(0°)|?7(y)) —)-p 0 as T —)■ cx) and e —)■ 0. 

Remark 10 The above result states that, so long as Q(.; f3) satishes standard properties ([Gl], 
[G2]), and if the so-called binding function b(-) is one-to-one, an ABC algorithm that uses 
as summary statistics the minimizers of Q(.; f3) will yield a posterior that is degenerate at 
0°. For a specihc objective function, conditions [Gl] and [G2] are generally satished under 
more primitive conditions; see, for example, Jennrich (1969) in the setting where Q{.-, (3) is 
the nonlinear least squares criterion, and Newey and McFadden (1994) in the case where 
Q{.;f3) is a minimum distance criterion. While the result of Theorem is intuitive it is 
nonetheless important as it illustrates that we are not conhned to using simple summary 
statistics of the data or the log-likelihood function L (y; f3) of the auxiliary model within 
ABC. Instead, any criterion function satisfying [G1]-[G3] can be used to generate valid 
summary statistics for use in ABG. 
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Remark 11 An alternative to Algorithm 5 is to replace the summary statistics ?7(y) = /3(y), 
77(z*) = /3(z*) in Step (3) with a distance based on (9/c}/3)Q(zb/3(y)); e.g., 

d {^(y), = y{ (9/a/3)Q(zh 3(y))y fi(y) | {d/df3)Q{z^] 3(y)) } , (7) 

for some positive dehnite weighting matrix (y). Such an algorithm would be quite useful 
in situations where {d/d^)Q{z‘^-, (3{y)) is known in closed form and would (in all cases) 
lead to an ABC algorithm that is several orders of magnitude faster than one based on 
computing /3(z*) at every value 0*. Under conditions similar to those in Theorem]^ a 
consistency result will hold for the posterior obtained from an ABC algorithm that uses 
the distance measure in ([^. We omit this proof for brevity. 

5.2 The Role of the Auxiliary Model 

Intimately tied to the idea of choosing a suitable criterion function is the choice of the auxiliary 
model from which the criterion function is computed. If the chosen auxiliary model is a poor 
representation of the observed data it is likely that no criterion function, likelihood or otherwise, 
will produce adequate summary statistics upon which to base our ABC algorithm. In this way 
using summary statistics from an auxiliary model inside of ABC is not a panacea. 

ABC algorithms based on an auxiliary model and with summary statistics derived from a 
criterion function Q{.] (3) can fail for precisely the same reason ABC based on arbitrary summary 
statistics can fail, namely, failure of [G3] (respectively [S2(2)]). Satisfaction of [G3] is affected 
by both the choice of the auxiliary model and the subsequent criterion function used to obtain 
77(y) = /3(y). Since the choice of auxiliary model and criterion Q(.;/3) are user and example 
specihc, attempting to give hard and fast guidelines for how one should choose either is a research 
topic in its own right. Rather, we simply advocate that validation of [G1]-[G3] should at least be 
attempted for any specihed combination (of model and criterion function) before implementing 
an ABC algorithm. In the following example we provide support for this statement by illustrating 
a case in which consistency is not yielded via what seems to be a sensible ABC specihcation: 
namely the use of an AR(2) auxiliary model along with an OLS criterion function to produce 
inference about the true parameters of a MA(2) model. 

Example 4 Consider again the MA(2) model from Example 2. Instead of a summary statistic 
based ABC approach, consider implementing ABC using summary statistics generated via the 
OLS criterion function for the AR(2) auxiliary model: yt = (3iyt-i + (32yt-2 + i^t, with ut (0,1). 
Using 

1 T 

Q{y, (3) = - - fdiyt-i - /52I/^-2)^ (8) 

t=3 
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and OLS estimator (3(y) = (/5i(y),/?2(y))^ summary statistic r/(y) = (/Si(y),/32(y))^ which 
has a simple closed form, can be used to build a computationally simple ABC algorithm. 

Given the particular structure ofQ{-; f3) in and under conditions Q for 6, [Gl] and [G2] 
are satisfied. Therefore, all that remains is to verify [G3]. Differentiating the limit criterion 
/3) with respect to f3 = {(3i,f2y yields the following eguations 


E{yt-M - = 0 , 

E{yt-2{zl - fizl_^ - /?2^t-2)) = 0. 


(9) 


Defining the autocovariances based on 0* as 7 q = (1 +( 6*i)^ +(6'2)^), 7i = i0\ + 0\6\) and Y 2 = ^2 
-write as 

7i -/5 i7o -/^27i = 0, 

Pil\- P2I0 = 0 - 


we can re- 


( 10 ) 


Solving for fi 2 (10) yields the following: 




7l 


-(f) 


/ 


7o 


(7i 


i\2 


% 


■ WO') 

7o 7o 


Interestingly, and as an illustration of the point made in Section f.l, the binding function b(0*) = 
f32{^^)y does not admit a unigue solution to 0 = b(0°) — b(0*), for all 0. 

For instance, simple numerical calculations reveal that if 6^ = (6*°, 0^' = (.6, .2)' the eguation 
0 = b(0“) - b(0*) has a unigue solution satisfying the conditions of Q, namely 6^=6^ (a 
second solution, 0* = (3,5)', exists but does not satisfy the parameter restrictions (§;. However, 
if 0^ = (.5, .5)^, the eguation 0 = b(0°) — b(0*) has two solutions satisfying the conditions of 
(|^, (.5, .5)^ and 6'^ = (1,2)/ Therefore, b(0*) = (32{^^)y is not a one-to-one function 

and hence will not yield consistent inference in general. 


6 Consistency of ABC in Ordinary Differential Equations 
Models 

In this section we investigate the ability of ABC to yield Bayesian consistent inference for pa¬ 
rameters governing a system of ordinary differential equations (ODEs). As will be demonstrated, 
this particular type of application, which has been given some attention in the ABC literature 
(see, for example, Toni et ai, 2009, Sun et al., 2014, Prangle, 2015), highlights certain impor¬ 
tant issues related to Bayesian consistency of ABC-based posterior estimates. In particular, by 
checking the conditions of Theorem in a simple deterministic system, we demonstrate that 
ABC can yield inconsistent inference in such settings, highlighting the importance of these con¬ 
ditions for verifying the validity of ABC-based inference. While we specihcally focus on a simple 
deterministic system, these Endings can easily be generalized to other ODEs. 

Specifically, we give our attention to the Lotka-Volterra (LV) model, which describes the 
interaction between a species xi, referred to as the prey species, and a species X 2 , referred to as 
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the predator species. For 0 = ( 6 ^ 1 , 6 * 2 )' unknown, we consider the deterministic LV model dehned 
through the system of ODEs: 


dxi 

dt 

dx2 

dt 


61X1 - X1X2, 


62X1X2 - X2. 


( 11 ) 


For any point in the interval [0,T], the vector x(tj) = {xi(ti),X 2 (ti)y is the solution to the 
above ODEs, with initial value to- Typically, it is assumed that we do not observe x(tj); rather, 
we observe a quantity corresponding to x(tj) that is measured with error that is both additive 
and independent over observational points, see, for example. Beck and Arnold (1977). Following 
this usual practice then, we specify a measurement equation of the form 


y{ti) = x(ti) + 


( 12 ) 


where ~ hi.<7.(0, and 'Ey is diagonal]^ 


Assume we have an observed sample of size Rt from ( 12 ), with corresponding design points 


ti,tRj., hxed or random. Our goal is to estimate the posterior density of 6 using the observed 
sample {y(h)}^i , and prior density p{0). Toni et al. (2009) propose to estimate these posteriors 
via ABC using the squared distance between the observed and simulated samples. Specihcally, 
for {y(h)}f!fi the observed sample and {z(h; 0 )}^j^ the simulated sample, obtained by solving 


equation ( 11 ) at 0 = { 61 ^ 62 )', ABC is based on the distance 


Rt 


p{y,z{e)} = 


(13) 


i=i *=i 


That is, draws of 6 are retained according to the proximity of the stochastic quantity yj{ti) to 
the deterministic quantity ZjitpO). 

It is critical to note, however, that choosing values of 0* such that p |y, z(0 ')}. £ will not 
yield an ABC-based posterior that is Bayesian consistent. This can be seen by noting that as 
Rt —>■ C) 0 , even if we select 0 °, and so Xj{ti) = for all ti and j = 1 , 2 , it will be 

the case that hm/^j,^oo P {y, z(0°)} E{u\{ti) -|- zz^h)) > £, for e: arbitrarily small. Therefore, 
there exists no value of 0 * G © for which p |y, z(0*)| < e as Rt —)■ 00 and e —)■ 0 , and so the 
ABC-based posterior dehned by the distance p{y,z(0)} can not be Bayesian consistent. 


However, an alternative to the “distance” p{y,z(0)} in (13) is a metric based on statistics 


obtained from minimizing an objective function representing the data in equation ( 12 ). A 


common means of obtaining (frequentist) point estimates for parameters dehned by ODEs is 


®It could be assumed that the evolution of xi and X 2 is stochastic rather than deterministic; however the 
adoption (or not) of this assumption is not germane to our discussion and we thus retain the ODE structure for 
the states. 
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nonlinear least squares (NLS), whereby the squared distance between the observed and simulated 
solutions is minimized (see Beck and Anrnold, 1977, for a discussion). This then motivates us to 
consider the consistency properties of an ABC method that mimics the spirit of NLS. As such, 
we consider as summary statistics for use in ABC, the parameters that minimize the ordinary 
least squares (OLS) criterion 


2 Rt 

Qir, = 

P2,j 


with respect to (3 = (/3),/32)^ Pi = P 2 = {^ 2 , 1 ^ ^ 2 , 2 )' 1 which dehnes = 

~k^Yld=iyjPi) the sample mean and (32,j = '^'n,f=i{yjPi) ~ Pi,jY the sample variance. 
ABC can then be conducted using /3(y) = (/3i(y),/32(y))^ its simulated counterpart /3(z*) = 
(/3i(z®),/32 (z*))', with a distance of the form specihed in (6) adopted. Further alternatives can 
be dehned by basing ABC on matching /3i(y) alone (respectively, /32(y)) with its simulated 
counterpart /3i(z*) (respectively, /32(z*)), with the use of /3i(-) alone as the matching statistic 
being closest in spirit to NLS. 

Sufficient conditions guaranteeing that ABC will yield consistent inference are given in The- 
oremj^and must be verihed, for each version of T/(y), '? 7 (z*) obtained from Q(-; /3). Whilst formal 
verihcation of the identihcation condition in this case is complicated by the fact that x(tj) 
has no closed form, some analytical insights are attainable, by noting the following. Dehne 
Xj = lim^.^ 

—>00 —>■00 EfJi and corresponding simulated coun¬ 
terparts Zj{6) = lim^^^^oo z]{e) = lim^^^oo for j = 1,2. 

Assuming these quantities exist, it can be shown that 





where is the {j,j) element of S^. However, it is also the case that 


lim 

Rt^oo 


^ z%e) - {z,{e)f. 


Hence, as Rt —)■ 00 


32 (y) - 32(2(0°)) 


7 ^ Op(l) 


and so there is no hope that ABC based on r/(y) = /32(y) will yield consistent inference. That 
is, and reverting to the general notation of the previous section, Q(y; /3) and Q(z(0°); /3) do not 
have corresponding limit (5oo(0°;/3); which violates Assumption [Gl] of Theorem]^ This same 
point pertains to the case in which the augmented statistic /3(y) is used. 

The critical insight from the above illustration, as it pertains to ABC, is that a mismatch 
between the assumed processes for the observed and simulated data, with the latter failing 
to replicate the stochastic nature of the former, can create a fundamental disconnect between 
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the matching statistics formed from the two sets of data, so that they will never coincide, no 
matter the proximity of the drawn parameter vector to the truth. We now contrast this with an 
alternative approach in which we deliberately draw simulated data according to the measurement 
equation 

z{ti-,6) =w{ti-,6)+ i>{ti), (14) 

where w(tj; 0) is the numerical solution of the ODE at parameter value 6 and is a random 
error drawn from the same distribution as In this case, it is easy to verify that the simulated 
statistics /32(z*) and /3(z*) depend on the measurement error variance in the same manner 
as the observed data, with Assumption [Gl] of Theorem]^ no longer violated as a consequence, 
and so, as Rt —)■ oo 


32(y) -32 (z(0) 


op(l), and 


3(y)-3(z(0) 


Op(l). 


Once again, since no closed form solution exists for the state process, establishing the identih- 
cation condition analytically, as in the previous examples, is not feasible. However, numerical 
exploration indicates the existence of consistency for matching statistics /3i(?/), /32(2/) /3(r/) 

when data is simulated according to (14) 


10 


It has generally been recognized that consistent inference for point estimates of parameters 
in ODEs is due to the additive nature of the measurement error, combined with the fact that 
the measurement error has mean zero, known variance, and is independent of the data, as well 
as the satisfaction of identihcation conditions guaranteeing the existence of a unique minimum 
at 0°; see, for example. Beck and Arnold (1977). However, when conducting inference for ODEs 
via ABC, we see that in addition to these conditions (or variants thereof), care must be taken 
to ensure that data is simulated in such a way that it matches the observed data. It is the price 
we pay for conducting complete inference using a simulation-based procedure. 


7 Discussion 

Consistency is one of the most fundamental properties with which to gauge the output of a 
statistical inference procedure. With our focus on Bayesian consistency, we demonstrate that 
in the limit (as both T ^ oo and e —)■ 0) the ABC posterior estimate will be degenerate at the 
true parameter (vector) if (and only if!) the summary statistics upon which ABC is based are 
appropriately chosen. Conditions guaranteeing Bayesian consistency of ABC posterior estimates 
for a wide range of summary statistics, with and without respect to an auxiliary model, with the 
former dehned with respect to an arbitrary criterion function, are developed and several examples 
featured in the literature are used to illustrate these conditions. The results are less heartening 
than expected and demonstrate that consistent inference in ABC is in no way guaranteed. In 

^*^Numerical results illustrating consistency are available from the authors upon request. 
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general, we find that ABC will only yield consistent inference when a judicious choice of summary 
statistics has been employed, subsequently calling into question a large collection of ABC results 
based on arbitrary summary statistics, as well as those generated from well-specihed auxiliary 
models. In addition, our results highlight the need both to specify a proper distance measure 
and to ensure an exact match between the process assumed to have generated the observed data 
and that used to produce simulated samples, in order to have any hope of yielding consistent 
inference. 

To determine if ABC will be Bayesian consistent in practice, we develop a useful and compu¬ 
tationally simple diagnostic procedure that can be applied to any given data set and any choice 
of summary statistics. This procedure constitutes an important hrst step in determining, in any 
practical situation, whether ABC will yield consistent inference. Formalization of this diagnostic 
procedure, as well as work detailing its theoretical properties, is a topic of ongoing research by 
the authors. 

Before closing, we re-emphasize the fact that the results presented herein, while cast within 
the framework of the ABC accept/reject algorithm, apply to the more sophisticated variants 
of the ABC method. In particular, the results are applicable to ABC algorithms that generate 
summary statistics through various simulation-based approximations, as well as algorithms that 
utilize more efficient methods of post-sampling density estimation. Given this fact, the results 
discussed herein can be used to form the basic foundation for determining Bayesian consistency 
for all summary statistic-based ABC algorithms. Further, the key issue that we have emphasized 
throughout, namely the need to verify the relevant conditions for Bayesian consistency, including 
the required one-to-one property of the (implied) binding function, is just as pertinent, of course, 
to related frequentist simulation-based inference methods. In particular, the development of a 
formal and rigorous method for conhrming the one-to-one property of the binding function is 
as critical to the establishment of the (asymptotic) validity of all other such methods as it is to 
ABC. 

8 Appendix: Proofs 

Proof of Theorem 1. The proof is broken into three parts: hrst, we show that the only value 
0* that will be selected for all e > 0 as T —)■ cx) is 0* = 0°; second, we demonstrate that for 
any e > 0 there exists some N{e) such that if A^ > N{e) the posterior density Pei^l'nij)) has a 
well-dehned probability limit; lastly, we use these two pieces to demonstrate that for any ^ 5 ( 0 °) 
and Ks := ©/^^(^ ), the posterior probability G A 5 |? 7 (y)) —)■ 0. 

Part 1: 
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By the triangle inequality 


<i{i7(y),i7(z')} < <ibl(y)M9')) + (i{b(e*),i 7 (z’)}. 


(15) 


Applying the triangle inequality again to the hrst term on the right-hand-side of (15) yields 

d{r;(y), r;(z‘)} < d{b(0°), b(0^)} + 4r;(y), b(0°)} + d{h{0^), r;(zO}. 

By [SI], r/(y) b(0°) and so (i{? 7 (y), b(0°)} = op(l). In addition, 

ci{r/(z*),b(0*)} < supd{r/(z(0)),b(0)}, 

0e© 

and by [S2(l)] sup 0 g 0 (i{77(z(0)), b(0)} = op(l). Combining these facts yields 

d{r}{y), r/(z*)} < d{b(0°), b(6/‘)} + op(l). (16) 


For hxed £>0, asT—)-cx)a value 0* will be selected if 

d{b(6>°),b(6>*)} + op(l) <£. 

By [S2(2)] the only value of G 0 for which b(6>*) = b(6>°) is = 6>°. Therefore, as T ^ cx), 
the only value of 0 * satisfying d{b(0°), b(0*)} < e for any e > 0 is 0 * = 0 °. 


Part 2: 


Part 1 suggests that for small enough e the posterior density Pei^l'nij)) will be zero for values 
0 e As as T —)■ oo. However, because Pe{d\'n{y)) is built from N random draws, for any e > 0 
we must ensure that N can be chosen large enough so that plimp_,.ooP£( 0 |? 7 (y)) exists for any 
£ > 0 . 

By compactness of © and Assumption [P], for any r > 0 there exists a hnite integer N{r) 
and points each drawn according to 0 * ~ p{0), such that 

N{r) 

® ~ LJ w.p.i. 

i=l 

By continuity of b(-), there exists an r{e) > 0 such that, ||0—0°|| < r (e) implies (i{b(0°), b(0)} < 
£ for any £ > 0. Combining the two ideas we see that for any e > 0, we can cover © with N{r{e)) 
balls w.p.l. Now, note that by Assumption [P] and the above argument, for any £ > 0, we can 
hnd a radius r{e) such that 0° G for some 0\ and (i{b(0°), b(0*)} < e w.p.l as a 

consequence. In addition, by Assumption [SI], [S2(l)] and Part 1 of the proof. 


d{r]{y), r/(z(0*))} = 4b(6>°), b(6>‘)} + op(l) 




for any 0* G © and so 


d{viy).vW))} < £ + op(i) 

for 0* snch that 0° G Kr(e)(^*)- Using this fact, we have that 

l£[z(0*)] = l[d{r/(y),r/(z(0*))} < £] = l[d{b(0°), b(0*)} < e] + op(l), 

= l + Op(l). 

From here we see that for T arbitrarily large and any e > 0, there exists some N{e) := N{r{e)) 
snch that for iV > (e) 

p(0)p(z|0)l£[z(0)] 


Pe{6\'n{y)) = 


{/© /zP(^)p(z|^')lAjz(0)]dzd0} 


dz 


(17) 


exists. 


Part 3: 


We now nse Parts 1 and 2 to show that Pr£(0 G A 5 |? 7 (y)) —)■ 0, where As := ©/M:5(0°). By 
Markov’s ineqnality 

lim Pr|pr^(0 G Aslviy)) > E|pr£(0 G A 5 |r/(y)) 1/^, (18) 

T^oo T^oo 


for all ^ > 0, and the resnlt follows if the left-hand side of (18) is zero. By the dehnition of 


Pe{ 0 \r]ij)), Pys {6 G A|?7(y)) < 1 for any Ac©, £ > 0, and T > 1. By the bonnded convergence 
theorem 

lim E [Pr^(0 G A 5 |r/(y))] = E [plim;r^^Pre(0 G A 5 |r/(y))] . 

T^oo 


By the dehnition of Pe{G\p{y)) in (17), Pr£(0 G A 5 |? 7 (y)) = Op(l) only if, for some, £ > 0 


snp I [4r/(y), r/(z(0))} < e] = op(l). 

eeAs 

For any 5 > 0, if ||0 — 0°|| > 5, by injectivity of b(-), it follows that (i{b(0), b(0°)} > for 
some e* > 0. By compactness of A^ and continnity of b(-), there exists some 0* (not necessarily 
nniqne) snch that 

0* = arg ci{b(0°),b(0)}, (19) 


and, by injectivity of b(-), for some £* > 0 

<i{b(e“),b(0.)}>£. >0. 


( 20 ) 


Moreover, by Assnmptions [SI], [S2(l)], and eqnation (16) in Part 1, 


snp |l£[z(0)] -l£[b(0)]| = snp |I[d{r/(y), r/(z(0))} < e] - l[d{b(0°), b(0)} < e]| = op(1). (21) 
9&As B&As 
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From equation (21) it follows that 


I[d{ri{y), ? 7 (z( 0 *))} < e] = l[d{b(0°), b(0J} < e] + op(l). 


By the dehnition of 0* in (19) and equation (20), for any £* < e* 


l[d{b(0°),b(0,)} <£*] = 0. 


We can then conclude that 


^d{ri{y),r}{z{0^)} < £*] = op(l). 


Moreover, from equation (19) 


inf d{b(0°),b(0)} = 4b(0°),b(0,)} > £, > 
OgAs 


and so it follows from (21) and equation 


( 22 ) 


supl|(i{rj(y).rj(z(e))} < e’] = Op(1). 

OgAs 

Therefore, for e < e* and a corresponding N > N{e) number of simulated draws, which exists 
by Part 2, Fts{6 e A 5 |? 7 (y)) = op(l) and the result follows. ■ 


Proof of Corollary 1. We have two cases to consider: one, the vector b(0*) is one-to-one in 
0* and two, only the sub-vector bi(0*) is one-to-one in 0*. Clearly, if the hrst case obtains then 
the result follows from Theorem 1 and so we can focus on the latter case. 

For the second case then, by the triangle inequality 


47(y), 7(z‘)} < 47(y), b(0*)} + d{b(0*), 7 (z*)}. (23) 


Using the same arguments as in Theorem equation ([2^ can be restated as 


47(y), 7(z‘)} < c?{b(0°), b(0*)} + op(l). 


By assumption is an induced metric, and so for vectors x and z (of the same dimension) 

d{x, 2 ;} = 0 if and only if x = z. Using this fact we see that 


4b(0°),b(0)} = 0 


bi(0°) 

b2(0°) 


bi(0) \ 

b2(0) J • 


The key observation is that the set Ae := {6' G 0 : ||b2(0°) — b2(0)|| = O} always includes the 
point 0 = 0°, but can include other points since b 2 (-) need not be one-to-one. However, by 
[C2(2)], we know that bi(-) is one-to-one and so the only value of 0 for which 


f bi(0°) \ _ / bi(0) \ / 0 \ 

V b2(0°) ) V b2(0) J [0 ) 
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is 0 = 0^. The result then follows by the same arguments as in Theorem 


Proof of Theorem 2. By the triangle inequality 

43(y),3(z*)} < 43(y), b(0*)} + 4b(0*),3(z*)}. 


Before proceeding further we 


must show that, under the maintained assumptions^^ 


sup 

0 £& 


P{z{0))-h{0) 


0p(l). 


(24) 


(25) 


Dehne the following terms: 


Q(0,/3) = Q(z(0),/3)-goo(0,b(0)), 

Qoo{0,f3) = Qoo(0,/3)-Qoo(6>,b(0)). 


Note that, by [Gl(l)], for all 5 > 0, if sup 0 g 0 ||/3(z(0)) — b(0)|| > 6, there exists e(5) > 0, such 
that 

sup ||goo(0,3(z(0))) - goo(0,b(0))|| = sup ||goo(0,3(z(0)))|| > 6(5). 

6»e© 6»e© 

From here, note that 


Pr ( sup ||/3(z(0)) -b(0)|| >5 < Pr sup ||goo(0,/3(z(0)))|| > e(5) ) . 

vfle© / \e&& 


The result in (25) then follows if sup^g© ||goo(^,/3(z(0)))|| = op(l). 


Uniformly in 0, 


IIQoo(0,3(z(0))))|| < iiQoo(0,3(z( 0))) -g(0,3(z(0)))|| + \\Q{0,p{z{0m 

= ||goo(0,3(z(0))) - g(z(0),3(z(0)))|| + ||g(0,3(z(0)))|| 

< sup \\QU0,(3) - Q{z{0),(3)\\ + ||g(0,3(z(0)))|| 

/3eB 

< op(l) + ||g(0,3(z(0)))||. (26) 


The hrst inequality follows from the triangle inequality, the second from the dehnition of 
Qoo{0,(3) and Q{0,f3), the third from the dehnition of sup, and the last from Assumption 

[ 01 ( 2 ) 1 . 


From (26), the result follows if 


sup ||g(0,3(z(0)))|| = op(i). 

0 e© 


Recall that d{-, •} is an induced metric and hence convergence in || • || will imply convergence in d{-, •}• 
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By the definition of /3(z(0)), uniformly in 0, 


||Q(0,/3(z(0)))|| < inf ||Q(0,/3)||+op(l) 

ptr5 


< inf me,13) - Q^{e,l3)\\ + inf \\Q^{e,(3)\\ + op{l) 

pGo pGJ3 

< sup \\Q{z{0),f3) - Qoc{0,(3)\\ + 0 + 0 p(l) 

/3eB 

< Op(l), 


(27) 


with the last inequality following from [Gl(2)]. Combining equations (26) and (27) yields 


supeg© 


/3(z(0)) — b(0) = op(l), and we can conclude 


(i{/3(z*),b(0*)} < sup(i{/3(z(0)),b(0)} = op(l). 
6»e© 


Applying equation (28) to equation (24) we have 


d{(3{y),^{z^)} < d{/3(y), b(0*)} + op(l). 

Applying the triangle inequality to (i{/3(y), b(0*)} yields 

d0(y)Mf>')} < <i{3(y),b(e”)} + o!{b(e»).b(e‘)} + op(i), 
By [Gl] and [G2], (3{y) — b(0°) = op(l), and so 

d0{y), h{g‘)} < <i{b(e"), b(e-)} + op(i). 


(28) 


(29) 


(30) 


From (29) and (30) we thus have 

d{3(y),3(z0} < d{b(0°),b(0*)} + op(l). 

For fixed £ > 0, as T —>■ cx) a value 0* will be selected if and only if 

4b(6l°),b(6/*)} + op(l) <£. 

By [G3] the only value of 6/* G © for which b (6>*) = b (6>°) is 6^= 6>°. Therefore, the only 
value of 0* satisfying d{b(0°), b(0*)} < e as £ ^ 0 is 0°. The result follows using similar 
arguments to those of Theorem ■ 
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